Laplace approxmation & free-action

$\DeclareMathOperator{\vec}{vec}$

§ Free-action

Free-action is the time-integral of free-energy

$$ \begin{eqnarray*} \overline F &=& \overline U - \overline H\\ \overline U &=& \int dt \langle L(\psi)\rangle_{q_\psi} = \int dt \langle \ln p(y, \psi)\rangle_{q_\psi}\\ \overline H &=& \int dt \langle q(\psi)\rangle_{q_\psi} \end{eqnarray*} $$

§ Mean-field assumption

Now let our parameters factorise into states, parameters, and hyperparameters, $ \psi \to \{u(t), \theta, \lambda\}$,

and let the approximate densities over these variables follow mean-field assumption $q(\psi) = q(u, t)q(\theta)q(\lambda)$.

$\theta$ parameterises the first moment of the states, and is independent of $\lambda$, which parameterises the second moment.

Write

$$ \begin{eqnarray*} \overline U &=& \int dt \langle L(u, t, \theta, \lambda)\rangle_{q_uq_\theta q_\lambda}\\ \overline H &=& \int dt \langle q(u, t)\rangle + \langle q(\theta)\rangle + \langle q(\lambda)\rangle \end{eqnarray*} $$

§ Laplace approximation

Let

$$ \begin{eqnarray*} q(u, t) &=& N(\mu_u(t), \Sigma_u(t))\\ q(\theta) &=& N(\mu_\theta, \Sigma_\theta)\\ q(\lambda) &=& N(\mu_\lambda, \Sigma_\lambda) \end{eqnarray*} $$

Naturally,

$$ \overline H = \frac 1 2\int dt \ln|\Sigma_u| + \frac 1 2 \ln|\Sigma_\theta| + \frac 1 2 \ln|\Sigma_\lambda| + \frac 1 2 \left( ND_u + D_\theta + D_\lambda\right)\ln 2\pi e $$

For the internal action, $\overline U$, find its second-order truncation around its mode $\mu = \left[\mu_u, \mu_\theta, \mu_\lambda\right]^T$ (ignoring bilinear terms).

$$ \DeclareMathOperator{\tr}{tr} \begin{eqnarray*} \overline U &=& \int L(\mu, t) \\ &&+ \left\langle \frac 1 2\left[ (u - \mu_u)^T L^{(uu)} (u - \mu_u) \right.\right.\\ && +\left.\left. (\theta - \mu_\theta)^T L^{(\theta\theta)} (\theta - \mu_\theta) \right.\right.\\ && +\left.\left. (\lambda - \mu_\lambda)^T L^{(\lambda\lambda)} (\lambda - \mu_\lambda) \right] \right\rangle_{q_u q_\theta q_\lambda}dt\\ &=& \int L(\mu, t) + \tr\left(\Sigma_u L^{(uu)}\right) + \tr\left(\Sigma_\theta L^{(\theta\theta)}\right) + \tr\left(\Sigma_\lambda L^{(\lambda\lambda)}\right)dt \end{eqnarray*} $$

Finding conditional precision

Solve for $\partial\overline F/\partial{\Sigma_u(t)}=0$

$$ \begin{eqnarray*} &&\frac 1 2 \int dt L^{(uu)} - \frac 1 2 \int dt\Sigma^{-1}_u = 0\\ \Rightarrow && \Sigma^{-1}_u(t) = - L^{(uu)}(\mu, t) \end{eqnarray*} $$

Similarly,

$$ \begin{eqnarray*} \Sigma^{-1}_\theta &=& -\int dt L^{(\theta\theta)}(\mu, t)\\ \Sigma^{-1}_\lambda &=& -\int dt L^{(\lambda\lambda)}(\mu, t) \end{eqnarray*} $$

Note that

$$ \begin{eqnarray*} L(u, t, \theta, \lambda) &=& L(u, t|\theta, \lambda) + L(\theta) + L(\lambda)\\ L^{(uu)} &=& L^{(uu)}(u, t| \theta, \lambda)\\ L^{(\theta\theta)} &=& L^{(\theta\theta)}(u, t|\theta, \lambda) + L^{(\theta\theta)}(\theta)\\ L^{(\lambda\lambda)} &=& L^{(\lambda\lambda)}(u, t|\theta, \lambda) + L^{(\lambda\lambda)}(\lambda) \end{eqnarray*} $$

§ Variational action under Laplace approximation

With the following notation, one can write down the variational action, which is the internal action expected under their resepctive Markov Blanket.

$$ \begin{align} L_u &= L(u, t, \mu_\theta, \mu_\lambda)\\ L_\theta &= L(\mu_u, t, \theta, \mu_\lambda)\\ L_\lambda &= L(\mu_u, t, \mu_\theta, \lambda) \end{align} $$
$$ \begin{align} {V}_u = V(u, t) &= \langle L(u, t, \theta, \lambda)\rangle_{q_\theta q_\lambda}\\ &= L_u + \frac 1 2 \left( \tr[\Sigma_\theta L_\theta^{(\theta\theta)}] + \tr[\Sigma_\lambda L_\lambda^{(\lambda\lambda)}] \right)\\ \overline{V}_\theta = \overline V(\theta) &= \int dt \langle L(u, t, \theta, \lambda)\rangle_{q_u q_\lambda}\\ &= \int L_\theta + \frac 1 2 \left( \tr[\Sigma_u L_u^{(uu)}] + \tr[\Sigma_\lambda L_\lambda^{(\lambda\lambda)}] \right)dt\\ \overline{V}_\lambda = \overline V(\lambda) &= \int dt \langle L(u, t, \theta, \lambda)\rangle_{q_u q_\theta}\\ &= \int L_\lambda + \frac 1 2 \left( \tr[\Sigma_u L_u^{(uu)}] + \tr[\Sigma_\theta L_\theta^{(\theta\theta)}] \right)dt\\ \end{align} $$

And the following differentials on variational actions will become useful later.

Note that the notation, $A\!\!:$, stands for matrix vectorisation, e.g., $L\!\!:_\theta^{(\theta\theta)}$ is a vectorisation of $L_\theta^{(\theta\theta)}$

$$ \left( \begin{array}{ccc} L_\theta^{(\theta_{:1}\theta_{:1})} & L_\theta^{(\theta_{:1}\theta_{:2})} & \\ L_\theta^{(\theta_{:2}\theta_{:1})} & L_\theta^{(\theta_{:2}\theta_{:2})} & \\ & & \ddots \end{array} \right) $$

which becomes

$$\{ L_u^{(\theta_{:1}\theta_{:1})}, L_u^{(\theta_{:2}\theta_{:1})}, \cdots, L_u^{(\theta_{:1}\theta_{:2})}, L_u^{(\theta_{:2}\theta_{:2})}, \cdots \}^T$$

This is to be contrasted with, say, $L\!\!:_\theta^{(\theta\theta)(u)} and L\!\!:_\theta^{(\theta\theta)(uu)}$, which read

$$ \begin{align} &\left( \begin{array}{c} L\!\!:_\theta^{(\theta\theta)(u_{:1})}\\ L\!\!:_\theta^{(\theta\theta)(u_{:2})}\\ \vdots \end{array} \right), \;\;\;\text{ and}\\ &\left( \begin{array}{ccc} L\!\!:_\theta^{(\theta\theta)(u_{:1}u_{:1})} & L\!\!:_\theta^{(\theta\theta)(u_{:1}u_{:2})} & \\ L\!\!:_\theta^{(\theta\theta)(u_{:2}u_{:1})} & L\!\!:_\theta^{(\theta\theta)(u_{:2}u_{:2})} & \\ & \ddots & \end{array} \right), \end{align} $$

respectively.

Adopting these notations, the differentials of variational actions are

$$ \begin{eqnarray*} V_u^{(u)} &= L_u^{(u)}& +\frac 1 2 \left(I\otimes\Sigma^T\!\!\!:_\theta\right)^T L\!\!:_\theta^{(\theta\theta)(u)} + \frac 1 2 \left(I\otimes\Sigma^T\!\!\!:_\lambda\right)^T L\!\!:_\lambda^{(\lambda\lambda)(u)}\\ V_u^{(uu)} &= L_u^{(uu)}& +\frac 1 2 \left(I\otimes\Sigma^T\!\!\!:_\theta\right)^T L\!\!:_\theta^{(\theta\theta)(uu)} + \frac 1 2 \left(I\otimes\Sigma^T\!\!\!:_\lambda\right)^T L\!\!:_\lambda^{(\lambda\lambda)(uu)} \end{eqnarray*} $$
$$ \begin{eqnarray*} \overline V_\theta^{(\theta)} &= \int L_\theta^{(\theta)}& + \frac 1 2 \left( I\otimes \Sigma^T\!\!\!:_u\right)^T L\!\!:_u^{(uu)(\theta)} + \frac 1 2 \left( I\otimes \Sigma^T\!\!\!:_\lambda\right)^T L\!\!:_\lambda^{(\lambda\lambda)(\theta)}dt\\ \overline V_\theta^{(\theta\theta)} &= \int L_\theta^{(\theta\theta)}& + \frac 1 2 \left( I\otimes \Sigma^T\!\!\!:_u\right)^T L\!\!:_u^{(uu)(\theta\theta)} + \frac 1 2 \left( I\otimes \Sigma^T\!\!\!:_\lambda\right)^T L\!\!:_\lambda^{(\lambda\lambda)(\theta\theta)}dt \end{eqnarray*} $$
$$ \begin{eqnarray*} \overline V_\lambda^{(\lambda)} &= \int L_\lambda^{(\lambda)}& + \frac 1 2 \left( I\otimes \Sigma^T\!\!\!:_u\right)^T L\!\!:_u^{(uu)(\lambda)} + \frac 1 2 \left( I\otimes \Sigma^T\!\!\!:_\theta\right)^T L\!\!:_\theta^{(\theta\theta)(\lambda)}dt\\ \overline V_\lambda^{(\lambda\lambda)} &= \int L_\lambda^{(\lambda\lambda)}& + \frac 1 2 \left( I\otimes \Sigma^T\!\!\!:_u\right)^T L\!\!:_u^{(uu)(\lambda\lambda)} + \frac 1 2 \left( I\otimes \Sigma^T\!\!\!:_\theta\right)^T L\!\!:_\theta^{(\theta\theta)(\lambda\lambda)}dt \end{eqnarray*} $$

§ Optimisation: embedding and mode following

Suppose the time-dependent state, $u$, subsumes its motion up to arbitrary high order, one may unpack this and write $\tilde u = (u, u', u'', \dots)^T$.

Let this generalised state move along the gradient of variational energy/action, hoping to catch up the motion one level above when the gradient vanishes:

$$ \begin{align} \dot{\tilde u} &= V_u^{(u)} + \mathcal D\tilde u \end{align} $$

This way, when $V_u^{(u)} = 0$ (this happens at the mode where $\tilde u = \tilde\mu$), one has

$$ \begin{align} \dot u &= u'\\ \dot u' &= u''\\ \dot u'' &= u'''\\ &\vdots \end{align} $$

Thus, motion of the modes becomes modes of the motion. Here, $\mathcal D$ is a differential operator, or simply a delay matrix.

Let us find the linearisation of this state motion around its mode, $\tilde\mu$, which follows that

$$ V_u^{(u)} = V(\tilde\mu)_u^{(u)} + V(\tilde\mu_u)^{(uu)}(\tilde u - \tilde\mu) = V(\tilde\mu_u)^{(uu)}(\tilde u - \tilde\mu) $$

And have $\tilde\varepsilon = \tilde u - \tilde\mu$, so that $\dot{\tilde\varepsilon} = \dot{\tilde u} - \dot{\tilde\mu} = \dot{\tilde u} - \mathcal D\tilde\mu$.

With substitution, write

$$ \begin{align} \dot{\tilde\varepsilon} &= V_u^{(uu)}\tilde\varepsilon + \mathcal D\tilde u - \mathcal D\tilde\mu\\ &= \left(V_u^{(uu)} +\mathcal D\right)\tilde\varepsilon \end{align} $$

and note $\mathcal J = \left( V_u^{(uu)} + \mathcal D\right) = \partial\dot{\tilde u}/\partial\tilde u$.

Finding conditional expectation (updating scheme)

The updating scheme is again derived from Ozaki's local linearisation:

$$ \begin{align} \Delta \tilde u &= \left( \exp(\mathcal J) - I\right) \mathcal J^{-1} \dot{\tilde u}\\ &= \left( \exp(\mathcal J) - I\right) \mathcal J^{-1} \left(V_u^{(u)} + \mathcal D\tilde u\right) \end{align} $$

For parameters and hyperparameters, this reduces to

$$ \begin{align} \Delta \theta &= {\overline V_\theta^{(\theta\theta)}}^{-1} \overline V_\theta^{(\theta)}\\ \Delta \lambda &= {\overline V_\lambda^{(\lambda\lambda)}}^{-1} \overline V_\lambda^{(\lambda)} \end{align} $$

§ Example 1: Dynamic causal model

$$ \begin{align} y &= g(x, v; \theta) + z,\;\;\; z\sim N(0, \Pi(\lambda)^{-1}_z)\\ \dot x &= f(x, v; \theta) + w,\;\;\; w\sim N(0, \Pi(\lambda)^{-1}_w) \end{align} $$
$$ \begin{align} \ln p(y, u, t, \theta, \lambda) &= L(y|x, v, \theta, \lambda) + L(x|v, \theta,\lambda) + L(\theta) + L(\lambda) \end{align} $$

where $u=(v, x)^T$ and let $p(v)$ be uninformative for now. And let the parameter, $\theta$, and hyperparameter, $\lambda$, be independent and take the following form

$$ \begin{align} p(\theta) &= N(\theta|\eta_\theta, P^{-1}_\theta)\\ p(\lambda) &= N(\lambda|\eta_\lambda, P^{-1}_\lambda) \end{align} $$

which altogether lend the generative density to an analytical form

$$ \begin{align} L(t) &= - \frac 1 2 \varepsilon_v^T\Pi_z\varepsilon_v - \frac 1 2 \varepsilon_x^T\Pi_w\varepsilon_x - \frac 1 2 \varepsilon_\theta^T P_\theta \varepsilon_\theta - \frac 1 2 \varepsilon_\lambda^T P_\lambda \varepsilon_\lambda - \frac 1 2 \ln|\Pi_z| - \frac 1 2 \ln|\Pi_w| - \frac 1 2 \ln|P_\theta| - \frac 1 2 \ln|P_\lambda| \end{align} $$

where

$$ \begin{align} \varepsilon_v &= y - g(x, v)\\ \varepsilon_x &= \dot x - f(x, v)\\ \varepsilon_\theta &= \theta - \eta_\theta\\ \varepsilon_\lambda &= \lambda - \eta_\lambda \end{align} $$

If one lumps the time-dependent terms together:

$$ \begin{align} \varepsilon_u &= \left( \varepsilon_v, \varepsilon_x \right)^T\\ \Pi &= \left(\begin{array}{cc} \Pi_z & \\ & \Pi_w \end{array}\right) = \left(\begin{array}{cc} C_z^{-1} &\\ & C_w^{-1} \end{array} \right) = C^{-1} \end{align} $$

one writes

$$ \begin{align} L(t) &= - \frac 1 2 \varepsilon_u^T\Pi\varepsilon_u - \frac 1 2 \varepsilon_\theta^T P_\theta \varepsilon_\theta - \frac 1 2 \varepsilon_\lambda^T P_\lambda \varepsilon_\lambda - \frac 1 2 \ln|\Pi| - \frac 1 2 \ln|P_\theta| - \frac 1 2 \ln|P_\lambda| \end{align} $$

And let this expression prescribe generalised motion over its states, write

$$ \begin{align} L(t) &= - \frac 1 2 \tilde{\varepsilon_u}^T \tilde\Pi \tilde{\varepsilon_u} - \frac 1 2 \varepsilon_\theta^T P_\theta \varepsilon_\theta - \frac 1 2 \varepsilon_\lambda^T P_\lambda \varepsilon_\lambda - \frac 1 2 \ln|\tilde\Pi| - \frac 1 2 \ln|P_\theta| - \frac 1 2 \ln|P_\lambda| \end{align} $$

Conditional precision $\pmb\Lambda_u$

For state $u$, the conditional precision, $\Lambda_u=-L(t)^{(uu)}$, is

$$ \tilde{\varepsilon_u^{(u)}}^T \tilde\Pi \tilde{\varepsilon_u^{(u)}} $$

where

$$ \begin{align} \tilde{\varepsilon_u^{(u)}} &= \left( \begin{array}{cc} \tilde{\varepsilon_v^{(v)}} & \tilde{\varepsilon_v^{(x)}} \\ \tilde{\varepsilon_x^{(v)}} & \tilde{\varepsilon_x^{(x)}} \end{array} \right)\\ \tilde{\varepsilon_v^{(v)}} &= \left( \begin{array}{c} \varepsilon_v^{(\tilde v)}\\ \varepsilon_{v'}^{(\tilde v)}\\ \varepsilon_{v''}^{(\tilde v)}\\ \vdots \end{array} \right) = \left( \begin{array}{cccc} \varepsilon_v^{(v)} & \varepsilon_v^{(v')} & \varepsilon_v^{(v'')} & \\ \varepsilon_{v'}^{(v)} & \varepsilon_{v'}^{(v')} & \varepsilon_{v'}^{(v'')} & \cdots \\ \varepsilon_{v''}^{(v)} & \varepsilon_{v''}^{(v')} & \varepsilon_{v''}^{(v'')} & \\ & \vdots & & \ddots \end{array} \right) = -I \otimes g^{(v)}\\ \tilde{\varepsilon_x^{(x)}} &= \left[\mathcal D\tilde x - \tilde f\right]^{(\tilde x)} = \mathcal D - I \otimes f^{(x)} \end{align} $$

Thus,

$$ \tilde{\varepsilon_u^{(u)}} = -\left( \begin{array}{cc} I\otimes g^{(v)} & I\otimes g^{(x)} \\ I\otimes f^{(v)} & I\otimes f^{(x)} - \mathcal D \end{array} \right) $$

Conditional precision $\pmb\Lambda_\theta$

Conditional precision over parameter, $\Lambda_\theta = -L(t)^{(\theta\theta)}$ is

$$ \tilde{\varepsilon_u^{(\theta)}}^T \tilde\Pi \tilde{\varepsilon_u^{(\theta)}} + P_\theta $$

Let $\theta = (\theta_{:1}, \theta_{:2}, \dots, \theta_{:k}, \dots, \theta_{:K})^T$.

$$ \begin{align} \tilde{\varepsilon_u^{(\theta_{:k})}} &= \left( \begin{array}{c} \tilde y - \tilde g\\ \mathcal D\tilde x - \tilde f \end{array} \right)^{(\theta_{:k})} = -\left( \begin{array}{c} \tilde{g^{(\theta_{:k})}}\\ \tilde{f^{(\theta_{:k})}} \end{array} \right)\\ &= \left( \begin{array}{cc} I\otimes g^{(v\theta_{:k})} & I\otimes g^{(x\theta_{:k})} \\ I\otimes f^{(v\theta_{:k})} & I\otimes f^{(x\theta_{:k})} \end{array} \right) \left( \begin{array}{c} \tilde v\\ \tilde x \end{array} \right) \end{align} $$

Conditional precision $\pmb\Lambda_\lambda$

Conditional precision over hyperparameter, $\Lambda_\lambda = -L(t)^{(\lambda\lambda)}$, is, assuming $\lambda_i, \lambda_j\in\lambda$

$$ \begin{align} &\frac{\partial^2}{\partial\lambda_j\partial\lambda_i}\left( -\frac 1 2 \tilde{\varepsilon_u}^T\tilde\Pi\tilde{\varepsilon_u} - \frac 1 2 \ln|\Pi|\right)\\ =& -\frac 1 2 \frac{\partial}{\partial\lambda_j}\left[ \tilde{\varepsilon_u}^T \frac{\partial}{\partial\lambda_i}\tilde\Pi \tilde{\varepsilon_u} + \frac{\partial}{\partial\lambda_i}\ln|\tilde\Pi| \right]\\ =& -\frac 1 2 \tilde\Pi^{(\lambda_i)} \tr\left[ \tilde C \tilde\Pi^{(\lambda_j)} \tilde C \tilde\Pi^{(\lambda_i)} \right] \end{align} $$

where

$$ \begin{align} & \tilde{\varepsilon_u}^T \frac{\partial}{\partial\lambda_i}\tilde\Pi \tilde{\varepsilon_u}\\ &= \vec(\tilde{\varepsilon_u}\tilde{\varepsilon_u}^T)^T \vec(\tilde{\Pi}^{(\lambda_i)})\\ &= \tr\left[ \tilde{\varepsilon_u} \tilde{\varepsilon_u}^T \tilde{\Pi}^{(\lambda_i)} \right] \end{align} $$
$$ \begin{align} \frac{\partial}{\partial\lambda_i}\ln|\tilde\Pi| &= \frac{\partial\tilde\Pi}{\partial\lambda_i} \frac{\partial|\tilde\Pi|}{\partial\tilde\Pi} \frac{\partial}{\partial|\tilde\Pi|}\ln|\tilde\Pi|\\ &= \tilde\Pi^{(\lambda_i)} |\tilde\Pi|\tr\left[\tilde C \tilde\Pi^{(\lambda_i)}\right] \frac{1}{|\tilde\Pi|}\\ &= \tilde\Pi^{(\lambda_i)} \tr\left[\tilde C \tilde\Pi^{(\lambda_i)}\right] \end{align} $$
\begin{align} \frac{\partial^2}{\partial\lambda_j\partial\lambda_i}\ln|\tilde\Pi| &= \frac{\partial}{\partial\lambda_j} \tilde\Pi^{(\lambda_i)} \tr\left[\tilde C \tilde\Pi^{(\lambda_i)}\right]\\ &=\tilde\Pi^{(\lambda_i\lambda_j)} \tr\left[ \tilde C \tilde\Pi^{(\lambda_i)} \right] + \tilde\Pi^{(\lambda_i)} \tr\left[ \tilde C \tilde\Pi^{(\lambda_j)} \tilde C \tilde\Pi^{(\lambda_i)} + \tilde C \tilde\Pi^{(\lambda_i\lambda_j)} \right]\\ &= \tilde\Pi^{(\lambda_i)} \tr\left[ \tilde C \tilde\Pi^{(\lambda_j)} \tilde C \tilde\Pi^{(\lambda_i)} \right]\;\;\;\;( \text{if $\partial^2 \tilde\Pi/\partial\lambda\partial\lambda = 0$}) \end{align}

Conditional expectation over state (updating scheme)

Before calling upon Ozaki's scheme, one recalls that the observation, which affects the variational energy as well, has to be considered:

$$ \left(\begin{array}{c} \dot{\tilde y}\\ \dot{\tilde u} \end{array}\right) = \left(\begin{array}{c} \mathcal D\tilde y\\ V_u^{(u)} + \mathcal D \tilde u \end{array}\right) \implies \mathcal J = \left(\begin{array}{cc} \partial\dot{\tilde y}/\partial\tilde y & \partial\dot{\tilde y}/\partial\tilde u \\ \partial\dot{\tilde u}/\partial\tilde y & \partial\dot{\tilde u}/\partial\tilde u \end{array}\right) = \left(\begin{array}{cc} \mathcal D & 0\\ V_u^{(uy)} & V_u^{(uu)} + \mathcal D \end{array}\right) $$

Thus,

$$ \left(\begin{array}{c} \Delta\tilde y\\ \Delta\tilde u \end{array} \right) = \left(\exp(\mathcal J) - I\right) \mathcal J^{-1} \left(\begin{array}{c} \mathcal D\tilde y\\ V_u^{(u)} + \mathcal D \end{array} \right) $$

$$ \begin{align} \dot u &= g(u)\\ \ddot u &= \dot u g^{(u)}\\ \dddot u &= \frac{\partial}{\partial t} \ddot u\\ &= \dot u \frac{\partial}{\partial u} (\dot u g^{(u)})\\ &= \dot u \frac{\partial}{\partial u} (g(u) g^{(u)})\\ &= \underbrace{\dot u g^{(u)}}_{=\ddot u} g^{(u)} + \underbrace{\dot u g(u)g^{(uu)}}_{\text{ignored under local linearisation}} \end{align} $$